Citation Request:
This dataset is public available for research. The details are described in [Cortez et al., 2009].
Please include this citation if you plan to use this database:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
Relevant Information:
The datasets are related to white variants of the Portuguese “Vinho Verde” wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.
Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests):
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)
Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily).
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste.
3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines.
4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet.
5 - chlorides: the amount of salt in the wine.
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine.
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine.
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content.
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale.
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant.
11 - alcohol: the percent alcohol content of the wine Output variable (based on sensory data):
12 - quality (score between 0 and 10)
## X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1 7.0 0.27 0.36 20.7 0.045
## 2 2 6.3 0.30 0.34 1.6 0.049
## 3 3 8.1 0.28 0.40 6.9 0.050
## 4 4 7.2 0.23 0.32 8.5 0.058
## 5 5 7.2 0.23 0.32 8.5 0.058
## 6 6 8.1 0.28 0.40 6.9 0.050
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 45 170 1.0010 3.00 0.45 8.8
## 2 14 132 0.9940 3.30 0.49 9.5
## 3 30 97 0.9951 3.26 0.44 10.1
## 4 47 186 0.9956 3.19 0.40 9.9
## 5 47 186 0.9956 3.19 0.40 9.9
## 6 30 97 0.9951 3.26 0.44 10.1
## quality
## 1 6
## 2 6
## 3 6
## 4 6
## 5 6
## 6 6
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 4893 4893 6.5 0.23 0.38 1.3
## 4894 4894 6.2 0.21 0.29 1.6
## 4895 4895 6.6 0.32 0.36 8.0
## 4896 4896 6.5 0.24 0.19 1.2
## 4897 4897 5.5 0.29 0.30 1.1
## 4898 4898 6.0 0.21 0.38 0.8
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 4893 0.032 29 112 0.99298 3.29
## 4894 0.039 24 92 0.99114 3.27
## 4895 0.047 57 168 0.99490 3.15
## 4896 0.041 30 111 0.99254 2.99
## 4897 0.022 20 110 0.98869 3.34
## 4898 0.020 22 98 0.98941 3.26
## sulphates alcohol quality
## 4893 0.54 9.7 5
## 4894 0.50 11.2 6
## 4895 0.46 9.6 5
## 4896 0.46 9.4 6
## 4897 0.38 12.8 7
## 4898 0.32 11.8 6
## [1] 4898 13
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## 'data.frame': 4898 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## [1] 0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
##
## 3 4 5 6 7 8 9
## 20 163 1457 2198 880 175 5
##
## Low Medium High
## 183 3655 1060
## wine_df$quality.rate: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 4.000 4.000 3.891 4.000 4.000
## --------------------------------------------------------
## wine_df$quality.rate: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.000 5.000 6.000 5.601 6.000 6.000
## --------------------------------------------------------
## wine_df$quality.rate: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.000 7.000 7.000 7.175 7.000 9.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
9.5% Alcohol shows higher number of count in white wine.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
## [1] "Summary of fixed.acidity"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
## [1] "Summary of volatile.acidity"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
## [1] "Summary of citric.acid"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
In volatile acidity is also Normally distributed and outliers(1.1000 g/dm^3) on the higher end of the scale are visible.The median value is 0.2600 g/dm^3.
In white wines we find most have 0 g/dm^3 of citric acid.we find, the graph is positively right skewed with outlier 1.6600 g/dm^3 and median of citric acid is .3200 g/dm^3.In white wine dateset only citric acid shows zero value in very small percentage of dataset.
## [1] "Summary of chlorides"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
## [1] "Summary of free.sulfur.dioxide"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
## [1] "Summary of total.sulfur.dioxide"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
## [1] "Summary of density"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
## [1] "Summary of pH"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
## [1] "Summary of sulphates"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
## [1] 4898 14
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality quality.rate
## Min. : 8.00 Min. :3.000 Low : 183
## 1st Qu.: 9.50 1st Qu.:5.000 Medium:3655
## Median :10.40 Median :6.000 High :1060
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
01.In white wine ‘quality’ variable median and mean quantity value is 6 and 5.878.quality score in between 3 to 9.low score is 3 and high score 9 and most quality have 5 & 6.
02.As we seen fixed acidity is slightly high with minimum 3.8 and maximum 14.2, while volatile acidity ranges between 0.08 to 1.1, similar with citric acid with range between 0 to 1.66 g/dm^3.
03.Most pH values in between 3 and 3.30 pH level.
04.All of the features have a minimum value greater than 0 except for citric acid.
05.BothFixed acidity and residual sugar have the highest medians of any of the variables measured in g/dm^3.
06.The alcohol content in white wine from 8.00 to 14.20 percentile.
07. when we create a new variable ‘quality.rate’ we have total 14 varaibales
Density,fixed acidity,volatile.acidity & citric acid also provide different view of analysis.Also it great to see how they coordinate and provide Bivariate and Multivariate Analysis by using diffrent variables of data set.
## [,1]
## fixed.acidity -0.113662831
## volatile.acidity -0.194722969
## citric.acid -0.009209091
## residual.sugar -0.097576829
## chlorides -0.209934411
## free.sulfur.dioxide 0.008158067
## total.sulfur.dioxide -0.174737218
## density -0.307123313
## pH 0.099427246
## sulphates 0.053677877
## alcohol 0.435574715
## wine_df$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.587 4.600 6.393 10.700 16.200
## --------------------------------------------------------
## wine_df$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.300 2.500 4.628 7.100 17.550
## --------------------------------------------------------
## wine_df$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.800 7.000 7.335 11.500 23.500
## --------------------------------------------------------
## wine_df$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.700 5.300 6.442 9.900 65.800
## --------------------------------------------------------
## wine_df$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.700 3.650 5.186 7.325 19.250
## --------------------------------------------------------
## wine_df$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.800 2.100 4.300 5.671 8.200 14.800
## --------------------------------------------------------
## wine_df$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.60 2.00 2.20 4.12 4.20 10.60
## wine_df$quality.rate: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.40 10.10 10.17 10.80 13.50
## --------------------------------------------------------
## wine_df$quality.rate: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.40 10.00 10.27 11.00 14.00
## --------------------------------------------------------
## wine_df$quality.rate: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 10.70 11.50 11.42 12.40 14.20
## wine_df$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.870 3.035 3.215 3.188 3.325 3.550
## --------------------------------------------------------
## wine_df$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.830 3.070 3.160 3.183 3.280 3.720
## --------------------------------------------------------
## wine_df$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.790 3.080 3.160 3.169 3.240 3.790
## --------------------------------------------------------
## wine_df$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.080 3.180 3.189 3.280 3.810
## --------------------------------------------------------
## wine_df$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.840 3.100 3.200 3.214 3.320 3.820
## --------------------------------------------------------
## wine_df$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.940 3.120 3.230 3.219 3.330 3.590
## --------------------------------------------------------
## wine_df$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.200 3.280 3.280 3.308 3.370 3.410
## [1] "Correlation with pH and alcohol"
##
## Pearson's product-moment correlation
##
## data: alcohol and pH
## t = 8.5601, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.09374446 0.14893205
## sample estimates:
## cor
## 0.1214321
## [1] "Correlation with density and alcohol"
##
## Pearson's product-moment correlation
##
## data: alcohol and density
## t = -87.255, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7908646 -0.7689315
## sample estimates:
## cor
## -0.7801376
## [1] "Correlation with density and residual sugar"
##
## Pearson's product-moment correlation
##
## data: residual.sugar and density
## t = 107.87, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8304732 0.8470698
## sample estimates:
## cor
## 0.8389665
## [1] "Correlation with sulphates and free.sulfur.dioxide"
##
## Pearson's product-moment correlation
##
## data: free.sulfur.dioxide and sulphates
## t = 4.1508, df = 4896, p-value = 3.369e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03126264 0.08707928
## sample estimates:
## cor
## 0.05921725
## [1] "Correlation with sulphates and total.sulfur.dioxide"
##
## Pearson's product-moment correlation
##
## data: total.sulfur.dioxide and sulphates
## t = 9.5019, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1069590 0.1619585
## sample estimates:
## cor
## 0.1345624
As we seen above citric.acid,free.sulfur.dioxide,pH,Sulphates & Alcohol have positive and effective coorelation with Quality.The results are that good wine also tend to have high free-sulfur-dioxide and high ratio of free & total sulfur-dioxide.
In wine quality of 5~6 we see high range of citric acid and alcohol have low level percentage and hig quality of wine have low citric acid.
We Observed following relationships during our investigation :
01.We find very interesting relation between alcohol,residual sugar,total & free sulfur dioxide,sulfate,density and quality.
02.As we observe alcohol,density,residual increase as quality of wine increase.
03.Total and Free sulfur dioxide have aslo positive relation with quality of wine.
04.In white wine dataset density and alcohol negatively correlated.Higher quality of wine have high alcohol percentage and low density level.
05.High quality of wine have high quality of sulphates for better quality of white wine.
06.We have seen how alcohol and volatile acidity relate with quality. Higher alcohol and lower acidity give in general better quality wines.
## [1] "Alcohol correlation with Quality"
## [1] 0.4355747
## [1] "Alcohol correlation with Other variables"
## [,1]
## fixed.acidity -0.12088112
## volatile.acidity 0.06771794
## citric.acid -0.07572873
## residual.sugar -0.45063122
## chlorides -0.36018871
## free.sulfur.dioxide -0.25010394
## total.sulfur.dioxide -0.44889210
## density -0.78013762
## pH 0.12143210
## sulphates -0.01743277
## alcohol 1.00000000
01. We analyis how alcohol,desity and residual sugar have strongest correlationship.
02. The correlation between volatile acidity,pH & quality of wine with Alcohol surprise us.
03.Quality of wine divides into three category low,medium,and high and rating divides on 3 to 9.we analysis that quality 5 and 6 have medium quality have higher number of entry in dataset or we can say that quality of wine depends on taste of consumer but better wine rating is lies between 8~9.
04.Total and free sulfur dioxide shows correlation with sulphates and also together.
05.Alcohol level of wine decreases with the growth of residual sugar level.alcohol also plays key role to that investigation as we already obsevered with our scatterolots.
06.White wine data have no missing value and different types of variables to explorer.
07.All plotted boxplot have mostly normal distributed and most of scatterplot show excat relationship according to analysis.
we found a little struggle to analysis the outliners cause they have many of them but we do and overcome on it. For further investigation we will need more data and we do some statsical analysis and multiple varibles explorer by deep analyis of alcohol and quality of wine.as we categorise the alcohol percenatge than we do some extra analysis.
00. Udacity Diamond and facebook project. 01.https://s3.amazonaws.com/udacity-hosted-downloads/ud651/wineQualityInfo.txt
02.https://docs.google.com/document/d/e/2PACX-1vRmVtjQrgEPfE3VoiOrdeZ7vLPO_p3KRdb_o-z6E_YJ65tDOiXkwsDpLFKI3lUxbD6UlYtQHXvwiZKx/pub?embedded=true
03.http://r.789695.n4.nabble.com/Trellis-setting-xlim-or-ylim-by-data-range-in-whole-column-or-row-td795484.html
04.http://www.sthda.com/english/wiki/renaming-data-frame-columns-in-r
05.http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software
06.https://ggplot2.tidyverse.org/reference/geom_histogram.html
07.https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html
08.https://www.winespectator.com/drvinny/show/id/How-Does-pH-Affect-Alcohol-in-Wine